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ABSTRACT 

Recent work suggests that TCP throughput stability and pre¬ 
dictability within a video viewing session can inform the de¬ 
sign of better video bitrate adaptation algorithms. Despite a 
rich tradition of Internet measurement, however, our under¬ 
standing of throughput stability and predictability is quite 
limited. To bridge this gap, we present a measurement study 
of throughput stability using a large-scale dataset from a 
video service provider. Drawing on this analysis, we pro¬ 
pose a simple-but-effective prediction mechanism based on 
a hidden Markov model and demonstrate that it outperforms 
other approaches. We also show the practical implications in 
improving the user experience of adaptive video streaming. 

1 Introduction 


In recent years, we have seen a dramatic rise in the vol¬ 
ume of HTTP-based adaptive video streaming traffic in 
the Internet O- In contrast to traditional metrics such as 
transfer completion time for web requests, delivering good 
application-level experience for video introduces new met¬ 
rics such as low buffering or smooth bitrate delivery m- 

To meet these new application-level quality of experience 
goals, video players use dynamic bitrate adaptation within 
a viewing session Here, the video is chunked into 

discrete segments, and each chunk is encoded at different bi¬ 
trate levels, to enable the player to dynamically change the 
bitrate chosen for future video chunks in response to the op¬ 
erating conditions G3C 3- Note that in this setting, deliv¬ 
ering good application performance depends on the “consis¬ 
tency” of TCP throughput behavior within the session be¬ 
tween the client and the video server, rather than the burst or 
average properties of the Internet path. 

In this respect, understanding intra-session TCP through¬ 
put characteristics can improve our understanding of exist¬ 
ing video adaptation strategies (e.g., (23)) and inform the 
development of new algorithms (e.g., (23,25)). Specifically, 
there are two key questions that we wish to address: 


Stability: If the TCP throughput is stable, then adaptive 
video streaming algorithms can avoid frequent switches 
and pick the highest possible bitrate that does not induce 
buffering 114 16 [23 1. 

Predictability: Many adaptation algorithms use the es¬ 
timated TCP throughput from previous chunks to choose 
the bitrates for the next few chunks. Recent work has 
shown that an accurate throughput predictor, if available, 
can significantly improve the quality of experience for 
adaptive video streaming 123 25). 


Despite the rich measurement literature in characterizing 
various Internet path properties (e.g., |9][T2][l5][2T)), our un¬ 
derstanding of TCP throughput stability and predictability is 
quite limited. There has been surprisingly little work in this 
space and the closest related works we are aware of are dated 
and limited in scope n» 

Our goal in this paper is to bridge this gap. To this end, 
this paper makes three key contributions: 

• Measurement (<Q.- We analyze the TCP throughput sta¬ 
bility on a dataset consisting of minute-level through¬ 
put measurements from over 200K sessions from a large 
video provider. Our key findings are: a) A large num¬ 
ber of sessions have significant intra-session throughput 
variations; b) High throughput sessions tend to be more 
stable; and c) The throughput is more similar in neigh¬ 
boring/recent time slots and less similar to measurements 
made further apart. 

• Prediction algorithm (fj5f: Building on observed tempo¬ 
ral structure, we develop a simple-yet-effective algorithm 
based on the insight that the throughput can be modeled 
as a function of a hidden state variable - the number of 
concurrent flows at a bottleneck link. We develop a hid¬ 
den Markov model (HMM) predictor and show that it 
outperforms a range of timeseries modeling techniques. 

• Application implications ((|6f.- Using trace-driven simu¬ 
lations, we show that our HMM predictor significantly 
improves the video QoE over prior work that does not 
use throughput predictions G3 and is very close to the 
optimal achievable QoE which is based on the perfect 
knowledge of future throughput. 

2 Related work 

In this section, we place our work in the context of past work 
in Internet measurement and adaptive video streaming. 

Measuring path properties: Prior work has measured sta¬ 
bility of path properties such as the persistence and preva¬ 
lence of routes over time HE- Other work focuses on inter¬ 
domain routing stability and reports that popular destinations 
have stable routes (20) . In contrast our focus is on through¬ 
put stability and predictability. 

Bandwidth measurement tools: There are many tools for 
measuring the available bandwidth and the capacity of Inter¬ 
net paths (e.g., @0)- At a high-level, they extend packet 
pair techniques and provide mechanisms to deal with back¬ 
ground traffic interference. We refer readers to the survey 
by Jain and Dovrolis for more in-depth comparisons 0- 
However, these are active probes that result in a single data 
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point. In contrast, we use passive measurements to develop 
a systematic understanding of the temporal stability and pre¬ 
dictability of the TCP throughput. 


Throughput stability: Balakrishnan et al., use throughput 
measurements from a large web service and report that the 
throughput for the same client-server pair does not change 
significantly (less than factor of 2) for tens of minutes |7J. 
Zhang et al., analyze the stability in terms of statistical, op¬ 
erational, and predictive metrics J24| . They report that using 
recent history on the scale of minutes is useful but in the or¬ 
der of hours misleads estimators. Unfortunately, these are 
dated and limited in terms of scale and scopeQ 


Throughput prediction: Prior work developed approxi¬ 
mate analytical models of TCP throughput as a function of 


packet loss and delay |11 P7jT8j. However, these do not 
directly translate into actual prediction algorithms that can 
feed into video adaptation algorithms. 


Broadband measurements: Given the recent debate on 
network neutrality and video, measurements of broadband 
characteristics have regained prominence |2 [2T|. However, 
these do not focus on throughput stability and predictability. 

Adaptive video streaming over HTTP: Our work is moti¬ 
vated by Dynamic Adaptive Streaming over HTTP (DASH). 
Prior work implicitly assumes throughput is unstable and 
unpredictable and eschews this in favor of using the player 
buffer occupancy for controlling bitrates G3- Recent work 
123 25) argues that adaptive video streaming can signifi¬ 
cantly benefit from accurate throughput prediction. How¬ 
ever, these do not provide a concrete prediction algorithm. 
Our contribution is in developing an effective throughput 
predictor and demonstrating its utility for DASH. 


3 Dataset 


In this section, we describe the dataset we use for analyzing 
TCP throughput stability and predictability. 

Note that in contrast to other throughput and path mea¬ 
surements, we need continuous measurements over suffi¬ 
ciently long durations (e.g., several minutes). We are not 
aware of public datasets that enable such in-depth analysis 
of throughput stability and predictability at scale. We ex¬ 
plored datasets such as Glasnost |9|, FCC |2J, and from a 
EU cellular provider |3). Unfortunately, all of these had too 
few hosts and the sessions lasted only a handful of seconds 
making it unusable for the stability and predictability analy¬ 
sis of interest for adaptive video streaming. 

Our dataset is collected from the operational CDN plat¬ 
form of PPTV !5)!. PPTV is a leading online video content 
provider in China with more than 227 million users. We use 
measurements from real video sessions from this provider. 
These sessions cover 428, 000 unique client IPs and over 

1 These were performed in the late 90s and early 2000s and pre-date 
the widespread deployment of high-speed broadband, CDNs, and 
the growth of Internet video. Furthermore, they focus on a handful 
of source-destination pairs mostly located in universities. 
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Figure 1: CDF of session duration and throughput. 


1000 unique server IPs. The clients span 508 cities and 28 
ISPs in China. In total, we collect data from over 2.7 million 
sessions over a 4 day period. 

Each session consists of several video “chunks”. The ses¬ 
sion is divided into 1-minute epochs, and the client reports 
the average TCP throughput observed within this epoch dur¬ 
ing the active download times. If there are multiple chunks 
transmitted in the same epoch, the throughput reported for 
this epoch should be the byte-weighted mean of the average 
throughput of each chunk. Conversely, if a chunk spans mul¬ 
tiple epochs it contributes partially to each epoch it spans. 

As observed in other studies, the duration of each ses¬ 
sion is variable Figure [Ta] shows the CDF of the ses¬ 
sion duration in our dataset. Since we are interested in tem¬ 
poral stability and predictability, we focus on sessions that 
last more than 6 minutes. About 10% of the sessions last 
more than 6 minutes still yielding a substantial number of 
sessions (> 200K) for our analysis. Figure [Tb] shows the 
CDF of the per-epoch average throughput and suggests that 
the average throughput distribution is similar to residential 
broadband characteristics 1211. While this is indeed a single 
dataset from the Chinese Internet, based on these observa¬ 
tions and our experience with other datasets of a similar na¬ 
ture (e.g., I TO)) we believe that this is representative of video 
workloads measured in residential broadband settings. 

We do acknowledge one limitation—the finest time reso¬ 
lution we have is 1 minute. However, we believe that under¬ 
standing stability/predictability at a minute timescale is still 
valuable for adaptive video streaming applications and as we 
will show in £j6]it can still yield significant improvements for 
quality of experience. 


4 Intra-session throughput analysis 

In this section, we analyze three key characteristics of the 
throughput within a client-server session: 

1. How variable is the throughput within a session? 

For instance, if the variability is small, then the adapta¬ 
tion logic does not have to switch bitrates often. 

2. Is the variability correlated/anti-correlated vs. average 
throughput? 

If the variability is a function of the average throughput, 
then we may need to customize the adaptation logic for 
different deployments; e.g., wireless clients vs. fiber-to- 
home links. 

3. Are there temporal patterns within the session; e.g., how 
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Figure 2: Analyzing throughput stability through different metrics. 


similar are recent observations made k minutes apart? 
This temporal structure has key implications for pre¬ 
dictability as many adaptation algorithms use estimates 
of throughput over the next few chunks as part of their 
decision logic J23) . 

Intra-session variability: First, we compute the stan¬ 
dard deviation (“stddev”) of TCP throughput across differ¬ 
ent measurements within the session. Figure [2a] shows the 
CDF (across sessions) of the per-session throughput stddev. 
We see that about 20% of sessions have a stddev > 2Mbps. 
Second, we compute the coefficient of variation, which is the 
ratio of stddev to the mean. Figure[2b]shows the CDF of this 
normalized metric; we see that the roughly 40% of sessions 
have normalized stddev >50%. Now, the stddev could still 
be biased by a few outliers even if the throughput is mostly 
stable]^ Thus, we also compute the difference between the 
75-th and 25-th percentile throughput values within a ses¬ 
sion and plot the CDF in Figure [2c] Again, we see that a 
non-trivial fraction of sessions (> 30%) has a difference of 
>2Mbps/s. In short, this result confirms the general percep¬ 
tion that we need good bitrate adaptation strategies and that 
simple static bitrate selection will not suffice. 

Variability vs. average throughput: Next, we analyze if 
there is some relationship between throughput stability and 
the average session throughput. Based on the distribution 
in Figure [Tb] we categorize the PPTV sessions into differ¬ 
ent 800Kbps bins. Figure [3] shows the average normalized 
stddev of the sessions within each bin. As a general trend, 
the normalized variability decreases as with increased av¬ 
erage throughput. We posit that such high throughput ses¬ 
sions traverse less congested paths and thus the variability 
of throughput is also small. This result suggests that the 
throughput is more stable for higher throughput sessions and 
thus bitrate adaptation algorithms can afford to be less con¬ 
servative compared to low throughput sessions. 

Temporal structure: The above results provide an aggre¬ 
gate view of the variability within the session but do not shed 
light on the temporal structure. Such temporal structure can 
have key implications for predictability. For instance, con¬ 
sider two hypothetical sessions with the following measure- 

2 For instance, consider a session with measurements 2,2,2,2,20. 
This will have a very high stddev even though it is mostly stable. 



Avg session throughput (Kbps) 

Figure 3: Normalized stddev vs. average throughput. 

ments (in Mbps): (1) Session 1 = 1,1,1,0.5,0.5,0.5 and (2) 
Session 2 = 1,0.5,1,0.5,1,0.5. Now, both sessions have the 
same mean, stddev, percentile difference, but intuitively Ses¬ 
sion 1 is more predictable based on recent history than the 
pattern in Session 2. 

To quantitatively analyze the temporal structure (i.e., how 
the throughput changes during the course of a session), we 
compute the autocorrelation of the throughput time-series 
for different time-shiftsj^] Figure [d] summarizes the distri¬ 
bution (across sessions) of the these autocorrelations as a 
box-and-whiskers plot depicting the median, 25-th, 75-th 
percentiles and the min/max values for different time lags. 
While the autocorrelations are positive, we see a marked de¬ 
crease as the lag increases. In other words, the throughput 
is more similar in recent time slots and less similar to mea¬ 
surements made far earlier or later. 

To give some visual intuition, we show the throughput 
timeseries of a representative client-server session in Fig¬ 
ure [5] Here, we see that the throughput evolves during the 
course of a session and thus the correlation between distant 
timeslots tends to be lower. 

Summary of key findings:: Our throughput variability 
analysis shows that: 

1. A large number of sessions have significant variations 
of their intra-session throughput, with normalized stddev 
>50% for more than 40% of sessions. 

2. High throughput sessions generally are more stable than 

’The autocorrelation is defined as the R(t) = ^ 

where X t . is the throughput at time slot t, fi is the mean value of the 
throughput for the whole session, and r is the time lag in the time 
series. 
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Figure 4: Autocorrelation of the throughput over differ¬ 
ent time windows (25, 50, 75%ile). 
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Figure 5: An example of PPTV video session throughput. 


low throughput sessions. 

3. The throughput is more similar in recent measurements 
and the similarity decays with higher lag. 


5 Intra-session throughput prediction 


The previous section reveals significant throughput variation 
during a session and the need for good video bitrate adapta¬ 
tion schemes. Ideally, we can accurately predict the TCP 
throughput to select bitrates for the next few chunks to op¬ 
timize user perceived quality of experience 122 23). How¬ 
ever, this is challenging and the limitations of existing pre¬ 
diction mechanisms ( §5.1| > have even motivated efforts that 
avoid throughput-based adaptation 1131. In this section, we 
describe a simple but effective prediction motivated by the 
temporal structure in the throughput. Before we do so, we 
describe strawman solutions considered in the literature and 
their limitations in light of our observations. 

5.1 Strawman solutions 


Our goal here is not to exhaustively enumerate all possible 
prediction algorithms. As such, the models we consider are 
representative of classical time series models used in adap¬ 
tive streaming proposals jll |23| At a high level, a through¬ 
put prediction model can be viewed as a function of the ob¬ 
served throughputs over the previous p epochs. Let W t de¬ 
note the observed throughput at epoch t and W t ., W t+ a 
denote the estimate for the next A epochs. 


• Last Sample (LS): In the simplest case, we simply use 
the previous observation; i.e., \/i £ [t, t + A] : W = 
Wt- 1 - The problem with this approach is that a single 


4 We also tried “forecast” models that extrapolated trends but these 
performed worse and are not shown. 








sample will be a very noisy estimator and thus may cause 
significant bitrate oscillations 116 231. 

Arithmetic Mean (AM); To address the noise, we can 
consider “smoothing” using p measurements from his¬ 


tory; i.e., V* £ [t, t + A] : Wi — 


w t - q 


However, 


there are still two fundamental problems. First, if we 
use a small p, outliers can still cause significant under- 
or overestimation. Second, if we use a large p, measure¬ 
ments made too far back in history may induce serious 
biases as we saw in Figure [5] 

Harmonic Mean (HM): One way to minimize the im¬ 
pact of outliers in AM is using a harmonic mean © 

V* £ [t, t + A] : Wi = 


While this addresses 


the outlier problem, uncorrelated measurements too far 
in history can still bias the predictions. 

Auto-regressive models (ARMA,AR): Auto-regressive 
moving average (ARMA) is a classical timeseries mod¬ 
eling technique ED- The ARMA model assumes W t 
has the following form; W t = ao + a jWt-j + 

J2 C j=ibjSt.-j, where S t ~ iV(0,er 2 ) is i.i.d. Gaus¬ 
sian noise, independent of W t . p, q are the sizes of 
the sliding windows for auto-regression and moving av¬ 
erage, respectively, and 6 ARM a = {{aJLo> i b i}i=i} 
are the parameters that can be learned from training 
data (e.g., historical sessions). The auto-regression (AR) 
model is a simplified version of ARMA that assumes 
W t = a 0 + i a jWt-j + e t , where a 0 is a constant 
and e t is i.i.d. zero-mean Gaussian noise independent of 
W t . Oar = {{at}f = o}- Given training data and p, q, 
Yule-Walker equations can be adopted to leam the pa¬ 
rameters Oarma , or 0 A r ■ The key problem with these 
models is that they have implicit independence and sta¬ 
tionary assumptions. However, Figures [4] and [5] suggest 
that there is some inherent “stateful” and “evolving” tem¬ 
poral structure in the throughput, which contradicts these 
assumptions. 


5.2 Using a Hidden Markov Model 


Hidden Markov models (HMM) are widely used in many 
applications, ranging from speech recognition to event de¬ 
tection (8j. From a networking perspective, the intuition be¬ 
hind the use of HMM in our context is that the throughput 
depends on the hidden state—the number of flows sharing 
the bottleneck link. The visualization in Figure [5] confirms 
this intuition that the throughput has some stateful evolving 
behaviors. By capturing these state transitions and the de¬ 
pendency between the throughput vs. the hidden state, using 
HMM can yield more robust throughput predictions. 

Model specification: Suppose the throughput depends 
on some hidden state variables X t £ X, where X = 
{xi, • • • ,Xm} is the set of possible states and M = \X\ is 
the number of states. The state evolves as a Markov process 
where the likelihood of the current state only depends on the 
last state, i.e., ¥{X t \X t _ u X t _ 2 , ■ ■ ■ ,X x ) = P(X t | X^). 
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Figure 6: Error vs. HMM model size. 

We denote the transition probability matrix by P = {}, 
where = P(A" t = Xi\X t -\ = Xj). We let the probability 
distribution vector n t = (P(X t = X\), ■ ■ ■ ,P(X t = Xm))- 
Then tt /+t = n t P T . Each state “emits” the throughput ex¬ 
pected within that state. Within each hidden state X t , we 
model the throughput W t by a Gaussian distribution; i.e., 
W t \X t = x ~ N(n x ,crl). 

To see this concretely, let us revisit Figure[5] Here, we can 
conceptually think of splitting the timeseries into roughly 11 
segments each corresponding to a hidden state. Within each 
segment, the throughput is largely Gaussian; e.g., between 
timeslots 20-75 the throughput has mean 2900, and in slots 
10-20 and 125-135 the mean is 2500. 

Model learning: Given number of states M, we 

can use training data to learn the parameters of HMM, 
Ohmm = {7T 0 ,P,{(g x ,al),x £ X}} via the expectation- 
maximization (EM) algorithm (8j. Note that the number of 
states M needs to be specified. There is a tradeoff here in 
choosing suitable Ad. Smaller Ad yields simpler models, but 
may be inadequate to represent the space of possible behav¬ 
iors. On the other hand, a large M leads to more complex 
model with more parameters, but may in turn lead to over- 
fitting issues. We find empirically that Ad = 6 is a “sweet 
spot” in the tradeoff (Figure [6]). 

Online throughput prediction: At time t, given past 
throughput Wv.t-! = {Wi,--- ,Wt~ i}, we first use 

forward-backward algorithm [8j to determine 7 r t _ 1 | 1:t _ 1 = 
(P(X t _! = Xi\W\-t-\),- ■ ■ ,P(X t _! = X M \W 1:t -l)). 
Then the distribution of X t+T can be obtained by: 
7Tj +r |-| :t _-| = 7r t _ 1 | 1:i _ 1 P T+1 . Finally, we compute the 
maximum likelihood estimate of Wt + T , r > 0 as Wt+ T = 
H x , where x = arg max xeX P(X t+T = x\W 1:t -i). 

6 Evaluation 

In this section, we present trace-driven evaluations using the 
dataset in <|3]and evaluate our proposed HMM scheme vs. 
strawman approaches along two dimensions; (1) Prediction 
accuracy and (2) Video quality of experience. 

6.1 Improvement in prediction accuracy 

Setup: To leam the various parameters (e.g., Oarma, 

9rmm), we divide the dataset into equally-sized training 
and testing datasets. We leam these parameters from train¬ 
ing dataset and report error metrics on the testing dataset. 
For AR model, we empirically tried different p values in the 
training dataset and found p = 5 yields the best result. 
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(a) 90%ile per session, and median(b) Error distribution (25, 50, 
across sessions 75%ile) 

Figure 7: Prediction accuracy of different models; 
LS=Last Sample, AM=ArithmeticMean, HM=FIarmonic 
Mean, AR =AutoRegressive, ARMA = AutoRegressive 
Moving Average. 

Error metric: For each slot t of a session s, we compute 
the absolute normalized error Err s j = • where 

W 8 t and W fl , t denote the predicted and true throughput for 
slot t of session s. Given these “atomic” error values, we can 
summarize the error within and across sessions in different 
ways; e.g., median per-session and median across sessions 
or median per-session and 90-th percentile across sessions. 
Configuring HMM: One natural question about the HMM 
is how many states M we need in practice. Having more 
complex models with more states can decrease the error, but 
also increases the training time and risks of overfitting. Fig- 
ure[6]shows the testing error for HMMs with varying number 
of states. We see that while the error decreases with more 
complex state models, we see a natural diminishing returns 
property after 6 states. As a practical tradeoff between the 
above considerations, we choose a 6-state HMM. 

HMM vs. Strawman solutions: Figure [7] considers two 
possible ways to summarize the per-slot error values. Fig¬ 
ure [7a] shows the median across sessions of the “tail” 90- 


shows the overall distribution of the per-slot error values. In 
both cases, we see that the HMM model clearly outperforms 
other techniques. For instance, in Figure [7a] the HMM ap¬ 
proach has 60% improvement over the second best predic¬ 
tor (AR). Similarly, in the overall distribution we see that 
the HMM dramatically reduces the tail of the errors; e.g., 
more than 75% of the predictions of HMM have less than 
<18% compared to >27% for other models. We also con¬ 
sidered other summarizations such as median across sessions 
of per-session median, average-of-average etc., and found 
consistent results that HMM significantly outperforms the 
strawman models (not shown). Note that the expected ben¬ 
efits of HMM predictors will be even bigger when we go 
to finer time-scale, e.g. second-level instead of minute-level 
throughput prediction. 

One minor downside is that the “low tail” (25 percentile) 
error of HMM is worse than the last-sample predictor. This 
is due to some highly stable sessions where throughput is 
constant and thus last-sample predictor has zero error. Due 
to the quantization effect with only 6 states, there is a small 
bias with HMM predictions. However, as we will see next 
this has no impact on the application quality of experience. 


percentile prediction error within a session and Figure 7b 
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Figure 8: Adaptive video QoE improvement using 
throughput prediction; BB refers to the pure buffer- 
based adaptation that ignores throughput |T3]. 


6.2 Improvement in video QoE 


of the optimal. 

7 Conclusions 

Our imminent need for understanding throughput stability 
and predictability is motivated by adaptive streaming over 
HTTP |22j[23][25). There is surprisingly little work on this 
topic and large-scale datasets on “long lived” sessions with 
continuous throughput measurements needed to shed light 
on these aspects appear to be especially scarce]^] Our work 
bridges this gap by (1) providing a large-scale measurement 
analysis of intra-session throughput stability and (2) an on¬ 
line prediction mechanism based on a hidden Markov model. 
We hope that our work inspires further research on this topic 
at more fine-grained timescales and across different deploy¬ 
ment scenarios (e.g., cellular). 


Next, we evaluate the improvement in user quality of ex¬ 
perience (QoE) gained by using the improved HMM-based 
throughput prediction in the context of dynamic adaptive 
streaming over HTTP (DASH) (16j[23) . 

Setup: Our goal is to evaluate the benefit of improved 
throughput prediction via HMM and not to evaluate the spe¬ 
cific video adaptation heuristics or artifacts. To this end, for 
the adaptation algorithm we follow strategies formulated by 
recent efforts |23][25| , that take as input throughput predic¬ 
tions for the next few epochs (e.g., via harmonic mean) and 
solve an exact integer linear programming optimization to 
decide the bitrate for the next chunk. As a baseline, we also 
consider the buffer-based (BB) policy which does not use 
any throughput prediction G3- 

Error metric: Identifying suitable QoE functions for video 
is an open problem |6j. Here, we adopt a simple linear model 
suggested by previous work |23]|, which is the weighted sum 
of different factors such as average video quality, average 
quality variation, and total rebuffer time. We compute a nor¬ 
malized QoE metric of each algorithm relative to the the¬ 
oretical optimal, which could be achieved with the perfect 
knowledge of future throughput. 

QoE improvement: Figure[8]shows the CDF of the normal¬ 
ized QoE of different approaches. For clarity, we focus on 
a subset of predictors since the lines of other strawman so¬ 
lutions are very close to the Harmonic mean (HM) and AR. 
First, the result confirms observations from prior work that 
accurate prediction can dramatically improve QoE over the 
baseline buffer-based approach [ 13|23 1. Second, we also see 
the improved prediction accuracy of HMM also leads to the 
best QoE especially in the lower tail; e.g., the gap between 
the 20%ile QoE of HMM and the harmonic mean suggested 
in prior work (16] is almost 25% j£] Third, we see that the 
HMM-based approach is also very close to the optimal QoE 
achievable with perfect knowledge, with median being 90% 


5 One subtle issue is that even though AR is worse than Harmonic 
Mean in terms of prediction error its QoE distribution is better. This 
is due to a combination of two factors. First, the AR algorithm 
tends to be conservative and underestimates throughput; thus its 
rebuffering is low. Second, in our normalized QoE rebuffering has 
a relatively higher weight. Together, AR's QoE is better. 
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